================================================================================
REPLICATION PACKAGE README
================================================================================

Paper Title: Does Pricing of Internet Usage Steer Consumers or Meter Usage? 
Evidence from a Pricing Experiment
Authors: Brian McManus, Aviv Nevo, Zachary Nolan, Jonathan W. Williams
Journal: Review of Economics and Statistics

Date: December 18, 2025

================================================================================
DATA AVAILABILITY STATEMENT
================================================================================

The data used in this replication package are proprietary and cannot be publicly
shared due to a data use agreement with the data provider. The dataset contains
confidential subscriber-level information from an internet service provider (ISP).

DATA ACCESS AND USE RESTRICTIONS:

Access to the data was governed by a data use agreement that included:
1. Restriction of data use to academic research only
2. Maintenance of confidentiality of individual subscriber information
3. Prohibition against attempting to re-identify indßividual subscribers or
   geographic markets
4. Requirement to preserve the anonymity of the data provider in all published
   materials
5. No requirement for the data provider to review results prior to publication

INFORMATION FOR RESEARCHERS SEEKING SIMILAR DATA:

Researchers interested in obtaining similar data for future research should be
prepared to enter into comparable data use agreements with ISP data providers.
The proprietary nature of the data means that this specific dataset cannot be
made available through this replication package.

The input data for this analysis consists of:
- Subscriber-date level panel data spanning nine months
- Billing information including plan choices and service tiers
- Menu of available plans with prices and features (speeds, usage allowances)
- Application-specific internet usage

See the DATA DICTIONARY section below for detailed variable descriptions.

================================================================================
COMPUTATIONAL REQUIREMENTS
================================================================================

SOFTWARE REQUIREMENTS:

- R version 4.0.0 or higher (tested with R 4.1.0)

  Required R packages:
  - data.table (version 1.14.0 or higher)
  - tidyverse (version 1.3.1 or higher)
  - ggplot2 (version 3.3.5 or higher)
  - latex2exp (version 0.9.4 or higher)
  - lubridate (version 1.7.10 or higher)
  - MASS (version 7.3-54 or higher)
  - gtable (version 0.3.0 or higher)
  - grid (base R package)
  - reshape2 (version 1.4.4 or higher)
  - doParallel (version 1.0.16 or higher)
  - LowRankQP (version 1.0.5 or higher)
    Note: LowRankQP may require installation from archived CRAN version:
    install.packages("devtools")
    devtools::install_github("cran/LowRankQP")

================================================================================
DIRECTORY STRUCTURE
================================================================================

replication/
│
├── README.txt                      (this file)
├── 00_paths.R                      (directory setup)
├── 00_prepdat.R                    (data preparation - step 1)
├── 00_save_pre_post_data.R         (construct pre/post outcomes)
├── 00_descriptive.R                (descriptive statistics and figures)
├── 01_prepdat.R                    (data preparation - step 2)
├── 02_fixed_lambda_SE.R            (penalized synthetic control estimation)
├── 03_fixed_lambda_SE.R            (counterfactual calculations)
├── 04_figures.R                    (main results figures)
│
├── functions/                      (supporting functions)
│   ├── wsoll1.R                    (L1-penalized weighted regression)
│   ├── regsynth.R                  (regularized synthetic control)
│   ├── regsynthpath.R              (regularization path)
│   ├── pensynth_parallel.R         (parallel estimation wrapper)
│   ├── TZero.R                     (threshold small weights to zero)
│   ├── estimator_matching.R        (matching estimator)
│   ├── get_stats.R                 (summary statistics)
│   └── LowRankQP.R                 (quadratic programming solver)
│
├── data/                           (input data - NOT PROVIDED)
│   └── subdate_final.csv           (subscriber-date level panel)
│
├── inputs/                         (intermediate files - created by scripts)
├── results/                        (estimation results - created by scripts)
└── figures/                        (output figures - created by scripts)

================================================================================
DATA DICTIONARY
================================================================================

INPUT DATA FILES (PROPRIETARY - NOT PROVIDED):

1. subdate_final.csv
   - Unit of observation: Subscriber-Date

   Variables:
   - customer_key: Unique subscriber identifier (anonymized)
   - date: Calendar date (YYYY-MM-DD format)
   - mkt: Market identifier (anonymized)
   - tot_gb: Total data usage in gigabytes on given date
   - gb_video: Video data usage in gigabytes
   - gb_browsing: Web browsing data usage in gigabytes
   - gb_gaming: Gaming data usage in gigabytes
   - gb_sharing: File sharing data usage in gigabytes
   - gb_other: Other data usage in gigabytes
   - gb_netflix: Netflix data usage in gigabytes
   - gb_youtube: YouTube data usage in gigabytes
   - gb_hulu: Hulu data usage in gigabytes
   - gb_slingtv: Sling TV data usage in gigabytes
   - gb_linear: Linear streaming TV (Hulu + Sling TV) in gigabytes
   - gb_othervideo: Other video streaming services in gigabytes
   - vid_flag: Indicator for video service subscription (0/1)
   - svc_tier: Internet service tier (integer from min_tier to max_tier)

MASKED PARAMETERS:

The following parameters are set to NA in the code to protect the data
provider's identity. These would need to be filled in with actual values from
the data provider to run the replication:

In 00_prepdat.R:
- last_pre_date: Last date of pre-announcement period
- last_pre_mon: Last month of pre-announcement period

In 01_prepdat.R and 00_save_pre_post_data.R:
- start_announce_mon: First month of announcement period
- end_announce_mon: Last month of announcement period
- end_treat_mon: Last month of treatment period

In 03_fixed_lambda_SE.R and 04_figures.R:
- tier_allow: Vector of data allowances by tier (GB per month)
- p_tu: Price per overage unit ($)
- q_tu: Quantity of GB per overage unit
- p_i: Vector of internet plan prices by tier ($)
- p_t: Price of TV service add-on ($)
- days_in_m: Days in billing month
- max_ovr: Maximum overage charge ($)
- month_start: First month index for counterfactual calculations
- month_end: Last month index for counterfactual calculations
- n_tier: Number of service tiers
- min_tier: Minimum tier number
- max_tier: Maximum tier number

================================================================================
INSTRUCTIONS FOR REPLICATION
================================================================================

STEP 0: SETUP

1. Obtain the proprietary data from the ISP (see Data Availability Statement)

2. Set the working directory in 00_paths.R (line 3) to the location where you
   have placed the replication files

3. Create a data/ subdirectory and place the following files in it:
   - subdate_final.csv

4. Install required R packages (see Software Requirements section above)

5. Fill in the masked parameters (see MASKED PARAMETERS section above)
   with the actual values provided by the data source

STEP 1: DATA PREPARATION

Run the following scripts in order:

1. 00_save_pre_post_data.R
   - Constructs pre-period and post-period outcomes for each subscriber
   - Creates: data/ubp_outcomes.csv

2. 00_prepdat.R
   - Prepares subscriber-date and subscriber-month panels
   - Identifies treatment and control groups
   - Creates: inputs/subdate_*.csv and inputs/submonth_*.csv

3. 01_prepdat.R
   - Creates analysis sample (treated units + random sample of controls)
   - Constructs matching variables
   - Creates 200 bootstrap resamples for standard error estimation
   - Creates: results/pensynth/final/inputs/data.csv
   - Creates: results/pensynth/final/inputs/resamp/data_*.csv (200 files)

STEP 2: ESTIMATION

4. 02_fixed_lambda_SE.R
   - Runs penalized synthetic control estimation for all bootstrap samples
   - This script is designed for parallel execution on a computing cluster
   - Uses SLURM_ARRAY_TASK_ID environment variable for job array parallelization
   - Creates: results/pensynth/final/fixed/[lambda]_SE/[samp_id]/[group]_[unit].csv
   - To run locally: Set task_id manually (line 84) and loop over all task IDs

   Note: The script divides work across 400 jobs (200 samples × 2 jobs per sample)

5. 03_fixed_lambda_SE.R
   - Computes counterfactual outcomes using estimated synthetic control weights
   - Calculates expected overage charges
   - Runs for main sample (task_id=0) and bootstrap samples (task_id=1:200)
   - Creates: results/pensynth/final/fixed/[lambda]_SE/[task_id]-cf.csv
   - To run locally: Set task_id manually (line 77) and loop over 0:200

STEP 3: DESCRIPTIVE STATISTICS AND FIGURES

6. 00_descriptive.R
   - Generates descriptive statistics tables and figures
   - Creates:
     * Table 1: Usage and plan choice summary statistics
     * Table 2: UBP response - usage outcomes
     * Figure 3a-3d: Plan changes over time (4 subfigures)
   - Saves: figures/table1.tex, figures/table2.tex, figures/Figure3*.png

7. 04_figures.R
   - Generates all main results figures
   - Set just_graph=1 (line 10) to skip simulation loop and use pre-computed
     smoothed data
   - Creates:
     * Figure 4a: TV Add Probability
     * Figure 4b: TV Drop Probability
     * Figure 4c: Tier Upgrade Probability
     * Figure 4d: Tier Downgrade Probability
     * Figure 5: Total Usage Response
     * Figure 6a: Usage Response (Non-Upgraders)
     * Figure 6b: Usage Response (Upgraders)
     * Figure 7a: Usage by Category (Non-Upgraders)
     * Figure 7b: Usage by Category (Upgraders)
     * Figure 7c: Video Service Usage (Non-Upgraders)
     * Figure 7d: Video Service Usage (Upgraders)
     * Figure 8a: Monthly Bill Impact
     * Figure 8b: Monthly Bill by Component
     * Figure B.1: Expected Overage Distribution
   - Saves: figures/Figure*.png

   Note: To regenerate the smoothed data files from scratch, set just_graph=0
   This will take significantly longer as it performs kernel smoothing on 200 
   bootstrap samples.

================================================================================
FIGURE AND TABLE CORRESPONDENCE
================================================================================

The following files create results that appear in the paper:

TABLES:
- Table 1 (Usage and Plan Choice): 00_descriptive.R → figures/table1.tex
- Table 2 (UBP Response - Usage): 00_descriptive.R → figures/table2.tex

MAIN TEXT FIGURES:
- Figure 3 (Plan Changes):
  * Panel A: 00_descriptive.R → figures/Figure3a.png
  * Panel B: 00_descriptive.R → figures/Figure3b.png
  * Panel C: 00_descriptive.R → figures/Figure3c.png
  * Panel D: 00_descriptive.R → figures/Figure3d.png

- Figure 4 (Plan Choice Responses):
  * Panel A: 04_figures.R → figures/Figure4a.png (TV Add)
  * Panel B: 04_figures.R → figures/Figure4b.png (TV Drop)
  * Panel C: 04_figures.R → figures/Figure4c.png (Tier Upgrade)
  * Panel D: 04_figures.R → figures/Figure4d.png (Tier Downgrade)

- Figure 5 (Usage Response): 04_figures.R → figures/Figure5.png

- Figure 6 (Usage by Upgrade Status):
  * Panel A: 04_figures.R → figures/Figure6a.png (Non-Upgraders)
  * Panel B: 04_figures.R → figures/Figure6b.png (Upgraders)

- Figure 7 (Usage by Category):
  * Panel A: 04_figures.R → figures/Figure7a.png (Categories, Non-Upgraders)
  * Panel B: 04_figures.R → figures/Figure7b.png (Categories, Upgraders)
  * Panel C: 04_figures.R → figures/Figure7c.png (Video Services, Non-Upgraders)
  * Panel D: 04_figures.R → figures/Figure7d.png (Video Services, Upgraders)

- Figure 8 (Bill Impact):
  * Panel A: 04_figures.R → figures/Figure8a.png (Total Bill)
  * Panel B: 04_figures.R → figures/Figure8b.png (Bill Components)

APPENDIX FIGURES:
- Figure B.1 (Overage Distribution): 04_figures.R → figures/FigureB1.png

================================================================================
REFERENCES
================================================================================

The penalized synthetic control methodology used in this paper builds on:

Abadie, A., & L’Hour, J. (2021). A Penalized Synthetic Control Estimator for 
Disaggregated Data. Journal of the American Statistical Association, 116(536), 
1817–1834.

================================================================================
CONTACT INFORMATION
================================================================================

For questions about the replication package, please contact:
Zachary Nolan
znolan@arizona.edu
University of Arizona
